Is novel research worth doing? Evidence from peer review at 49 journals

Significance There are long-standing concerns that scientific institutions, which often rely on peer review to select the best projects, tend to select conservative ones and thereby discourage novel research. Using peer review data from 49 journals in the life and physical sciences, we examined whether less novel manuscripts were likelier to be accepted for publication. Measuring the novelty of manuscripts as atypical combinations of journals in their reference lists, we found no evidence of conservatism. Across journals, more novel manuscripts were more likely to be accepted, even when peer reviewers were similarly enthusiastic. The findings suggest that peer review is not inherently conservative, and help explain why researchers continue to do novel work.

. Percent of citation hit papers, by novelty group (columns), computed on three different datesets (rows). "Hit" is defined as having the 8-year citation count fall in the top 5% of that year's citation distribution. Low/high tail and median novelty are values below/above the median.
The MAG-Scopus results match USMJ qualitatively. The ordering of groups by percent of citation hits is the same for MAG-Scopus 2010 and similar for MAG-Scopus 1990-2000, aside from the two middle groups, which are only slightly different from each other in the original USMJ. Across all datasets, the exercise supports USMJ's main claim that papers with high tail novelty and high median conventionality display a hit probability about twice as much as papers that are high on either of the two dimensions. Overall, these results suggest that our novelty measure based on the MAG literature filtered by Scopus journals works similarly to WoS data, and applies to more recent literature.
B. Validation using research articles vs. reviews As one validation check of whether USMJ measures novelty, we consider its distribution by article types expected to be more or less novel: primary research (i.e. research articles) and reviews. We expect reviews, being syntheses of primary research from the recent past, to have lower novelty and higher conventionality than primary research. Data on article type was missing for 7717 submissions across the two Life Sciences journals 2 , and these were excluded from this analysis. Among the observations with a valid article type, 21190 were primary research and 120 were reviews. Figure S1 below displays the boxplot of novelty and conventionality percentiles by article type. As predicted, the novelty of primary research was on average 6.9 percentiles higher than that of reviews (Wilcoxon rank-sums test statistic=2.13, p=0.033) and the conventionality was 13.3 percentiles lower (Wilcoxon rank-sums test statistic=-4.27, p<0.001), supporting the interpretation of USMJ measures as novelty and conventionality.

Figure S1. Boxplot of novelty percentiles (A) and conventionality (B) by article type in the Life
Sciences journals.

C. Additional validation from peer review
Additional support for the interpretation of USMJ as a proxy of novelty comes from (Bornmann et al., 2019). Bornmann et al. calculated a version of USMJ that is revised by Lee et al. (Lee et al., 2015). The Lee et al. revision retains the essence of USMJ, but simplifies the normalization of atypical combinations, i.e. does not randomly rewire the year-specific journal-journal co-citation network as USMJ and the present paper do. Bornmann et al. then compared these revised-USMJ measures to qualitative rankings from members of Faculty of 1000, a post-publication peer review system. A key result is that a one SD-increase in revised-USMJ novelty corresponds to a 7.47% increase in the number of reviewers labeling the paper as presenting a "new finding." This result suggests a correspondence between USMJ and qualitative perceptions of novelty post-publication.

Submitted vs. published versions of Cell manuscripts
While it may be desirable to use metrics derived from the submitted versions of manuscripts to study editorial decision-making, our data include the submitted manuscript files (in PDF or Word format) only for submissions to Cell, and only for two years. Lacking submitted versions of most of the submissions, we thus relied on the published versions of articles as a proxy for their submitted versions. However, for the subset with both submitted and published manuscript versions, we can compare how novelty metrics changed.
This subset consists of 3,717 articles submitted to Cell in 2013-2014. We used the Grobid (Introduction -GROBID Documentation, n.d.) package to parse the manuscript files and extracted references from them. Grobid uses the CrossRef API service to resolve the references in text into DOI. We were able to extract references for 2,778 initial submissions.
Whereas submitted-version novelties may be ideal in theory, in practice, accurately extracting references from PDF and Word files is error-prone. There are three main ways in which novelty calculated from the submitted and published versions may differ: 1. True changes in the references 2. Extraction errors from submitted versions: our pipeline may have failed to successfully identify or parse references in the submitted version. The Grobid documentation shows fscores in the range of 0.75 -0.89 (Introduction -GROBID Documentation, n.d.). 3. References included in the calculation of novelty from of the published version consist of all those with a valid MAG ID, but not necessarily a DOI. In contrast, references included in the calculation of novelty from the initial version consist of all those with a valid DOI (because they use CrossRef), but not necessarily MAG ID.
We display scatter plots of novelties and conventionalities calculated from the published vs. submitted versions of papers in Figure S2. Panel A shows raw novelty (z-scores, calculated as in Uzzi et al. (2013)) of submitted vs. published versions, and Panel B shows percentiles of novelty for published vs. submitted versions. Panels C and D repeat the plots for conventionality. Figure S2. Scatter plot of novelties (top row) and conventionalities (bottom row), using raw versions calculated as in Uzzi et al. (2013)) (left column). In each panel, the x-axis is the measure calculated from the paper's submitted version, and the y-axis is from the published version.
We interpret the much smaller correlation for raw novelties and conventionalities as revealing the sensitivity of these metrics to small changes in references, whether due to true changes or parsing and database errors, and which is evidenced by several extreme outliers. In contrast, percentiles are robust to outliers, and show that novelty and conventionality are relatively consistent across the publication process. Consequently, we rely on percentiles in the rest of our analysis.

Full vs. analytic samples
The primary difference between the full and the (smaller) analytic samples for both the Cell Press and IOP datasets is that the analytic samples include only submissions for which we were able to calculate novelty. Novelty calculation for a submission were often not possible because it was rejected and a published version was not located (e.g. never published) or it was published but not indexed in MAG.

A. Cell Press journals
Tables S2 and S3 below show how key covariates vary between the full and analytical samples for Cell and Cell Reports. The last column of the tables shows the standardized difference, i.e. difference in means divided by the pooled standard deviation. The differences are modest, never exceeding a third of the pooled standard deviation. The largest differences are observed in quality, measured by the mean reviewer recommendation (0=reject, 1=accept or revise-and-resubmit), and the number of references, with both being higher in the analytic sample. This pattern is likely explained by lower quality submissions being more likely to be rejected, and subsequently less likely to be matched to MAG, whether due to never being published or published under a significantly different title.

B. IOP journals
We exclude from the IOP analytic sample manuscript types that are likely to undergo an unconventional review process or not present original research, such as Corrigendum (n=580) and Perspective (n=504). The top 5 manuscript types remaining in the IOP analytics smaple are shown in table S4.

Type Count
Paper 38112 Letter 2309 Special Issue Article 2095 Topical Review 1017 Note 382 Table S4. Manuscript types of submissions to IOP journals in 2018 in the full sample. Table S5 presents the difference between the IOP analytic and full samples on key covariates. The analytic sample is much smaller than the full sample for two main reasons. First, we attempt to match 2018 submissions, whether accepted or rejected, to the mid-2019 MAG and the majority of these (63.1%) were not indexed or located by that time. It is possible they were published and indexed in later versions of MAG, or will be published in the future. Second, an additional 21.5% of submissions were located in MAG but had no data on references.
The standardized differences (last column) are modest, never exceeding a quarter of the pooled standard deviation. The analytic sample has slightly more positive reviewer recommendations and citation counts, suggesting that it consists of slightly higher quality papers. This selection bias is likely due to higher quality papers being more likely to be accepted for publication, leading to higher probability and speed of indexing in MAG.   Note: * p<0.05; ** p<0.01; *** p<0.001 Table S11.